Language ID
Welcome to our model card for Language ID. This model card describes our currently deployed language ID model available via our API.
Model Card
Lelapa-X-LID (isiZulu, seSotho, Afrikaans, ZA English)
Model Details
Basic information about the model: Review section 4.1 of the model cards paper.
Organization | Lelapa AI |
---|---|
Product | Vulavula |
Model date | 7 November 2023 |
Feature | Audio Language Identification |
Lang | isiZulu, seSotho, Afrikaans, ZA English |
Domain | General, Call Center |
Model Name | Lelapa-X-LID (isiZulu, seSotho, Afrikaans, ZA English) |
Model version | 1.0.0 |
Model Type | Proprietary Model |
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: Proprietary Model Trained on Audio Data
License: Proprietary
Contact: info@lelapa.ai
Intended use
Use cases envisioned during development: Review section 4.2 of the model cards paper.
Primary intended uses
Intended use is governed by the language and domain of the model. The model is intended to be used for the task of language identification of audio. The model is trained on audio data containing isiZulu, seSotho, Afrikaans or English, in the general and call center domain.
Primary intended users
The Language Identification model can be used for :
- Language Identification in the call center domain
- Improved ASR performance
- Market Research and Analysis
- Compliance monitoring for Customer Interactions
Out-of-scope use cases
All languages and domains outside of language identification for isiZulu, seSotho, Afrikaans, and ZA English.
Factors
Factors could include demographic or phenotypic groups, environmental conditions, technical attributes, or others listed in Section 4.3: Review section 4.3 of the model cards paper.
Relevant factors
Groups:
- Users who recorded utterances used to train the model are diverse across several factors such as age, location (primarily South Africa but from several regions/parts of the country depending on the language), and gender (both males and females are equally distributed across speakers). There is no record of the social class of speakers, as well as their health conditions, names, and any other sort of privacy details. Further details of groups and their constituents can be found in the datasheet
- Performance across groups is underway.
Environmental conditions, Instrumentation and technical attributes:
- Audio utterances are recorded in environments such as rooms, and call centers with a noiseless background.
- Audio segments’ length varies from 3 seconds to 30-40 minutes.
Metrics
The appropriate metrics to feature in a model card depend on the model being tested. For example, classification systems in which the primary output is a class label differ significantly from systems whose primary output is a score. In all cases, the reported metrics should be determined based on the model’s structure and intended use: Review section 4.4 of the model cards paper.
Model performance measures
The model is evaluated using the output accuracy. As an automatic metric, accuracy is a measure used in statistics and machine learning to evaluate the correctness of a binary classification model. Accuracy considers how close a measurement or test is to the true value. This is a ratio of the difference between true and measured value divided by the true value.
Accuracy: Testing on LID test set containing isiZulu, seSotho, Afrikaans and ZA English audio files.
Decision thresholds
No decision thresholds have been specified
Evaluation data
All referenced datasets would ideally point to any set of documents that provide visibility into the source and composition of the dataset. Evaluation datasets should include datasets that are publicly available for third-party use. These could be existing datasets or new ones provided alongside the model card analyses to enable further benchmarking.
Review section 4.5 of the model cards paper.
Datasets
- Proprietary call center dataset
Motivation
These datasets have been selected because they are open-source, high-quality, and cover the targeted languages - and utterances are recorded by a variety of speakers living in required regions. These help to capture interesting cultural and linguistic aspects that would be crucial in the development process for better performance.
Preprocessing
Data utterances are filtered initially by audio length, sampled, and normalized. We also make sure to select actual recordings i.e. recordings that are not just noise or null.
Training data
Review section 4.6 of the model cards paper.
Refer to the datasheet provided
Quantitative analyses
Quantitative analyses should be disaggregated, that is, broken down by the chosen factors. Quantitative analyses should provide the results of evaluating the model according to the chosen metrics, providing confidence interval values when possible.
Review section 4.7 of the model cards paper.
Unitary results
Models | Accuracy |
---|---|
Lelapa-X-LID | Afrikaans: 93.73%ZA English: 87.45%SeSotho: 86.98%isiZulu: 91.30% |
Intersectional result
In progress
Ethical considerations
This section is intended to demonstrate the ethical considerations that went into model development, surfacing ethical challenges and solutions to stakeholders. The ethical analysis does not always lead to precise solutions, but the process of ethical contemplation is worthwhile to inform on responsible practices and next steps in future work: Review section 4.8 of the model cards paper.
All data is synthetic and so the model does not contain any personal information. More details are in the datasheet.
Caveats and recommendations
This section should list additional concerns that were not covered in the previous sections.
Review section 4.9 of the model cards paper.
Additional caveats are outlined extensively in our Terms and Conditions.